Efficient determination of cluster boundaries for analysis of gene expression profile data using hierarchical clustering and wavelet transform.

نویسندگان

  • Harry Amri Moesa
  • Dukka Bahadur K C
  • Tatsuya Akutsu
چکیده

The existing methods for clustering of gene expression profile data either require manual inspection and other biological knowledge or require some cut-off value which can not be directly calculated from the given data set. Thus, the problem of systematic and efficient determination of cluster boundaries of clusters in gene expression profile data still remains demanding. In this context, we have developed a procedure for automatic and systematic determination of the boundaries of clusters in the hierarchical clustering of gene expression data based on the ratio of with-in class variance and between-class variance, which can be fully calculated from the given expression data. After the determination of dendrogram based on agglomerative hierarchical clustering, this ratio is used to determine the cluster boundary. Except this ratio which can be completely calculated from the given expression profile data, unlike other existing approaches, our approach does not require any manual inspection or biological knowledge. Our results are favorably comparable and in some of cases better than existing method which does not utilize prior information or manual inspection. Moreover, gene expression profile data are often contaminated with various type of noise and in order to reduce this noise content, we have also applied image enhancing technique called discrete wavelet transform. We tested a number of mother wavelet functions to smooth the noise in the gene expression data set and obtained some improvements in the quality of the results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ASIAN: Automatic System for Inferring a Network from Gene Expression Profiles

Recently, we have developed two methods for expression profile analysis. One is the automatic determination of cluster boundaries in hierarchical clustering of profiles [2], and another is the inference of a genetic network by application of graphical Gaussian modeling (GGM) [4]. Here, we synthesize the newly developed methods into a system, named ASIAN (Automatic System for Inferring A Network...

متن کامل

Statistical estimation of cluster boundaries in gene expression profile data

MOTIVATION Gene expression profile data are rapidly accumulating due to advances in microarray techniques. The abundant data are analyzed by clustering procedures to extract the useful information about the genes inherent in the data. In the clustering analyses, the systematic determination of the boundaries of gene clusters, instead of by visual inspection and biological knowledge, still remai...

متن کامل

The Haar Wavelet Transform of a Dendrogram – I

While there is a very long tradition of approximating a data array by projecting row or column vectors into a lower dimensional subspace the direct approximation of a data matrix through smoothing is less common. Applications of data array smoothing include visualization; filtering of less relevant, and thus harder to interpret, values; and as a means towards compression. Wavelet smoothing or r...

متن کامل

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...

متن کامل

به کارگیری روش‌های خوشه‌بندی در ریزآرایه DNA

Background: Microarray DNA technology has paved the way for investigators to expressed thousands of genes in a short time. Analysis of this big amount of raw data includes normalization, clustering and classification. The present study surveys the application of clustering technique in microarray DNA analysis. Materials and methods: We analyzed data of Van’t Veer et al study dealing with BRCA1...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Genome informatics. International Conference on Genome Informatics

دوره 16 1  شماره 

صفحات  -

تاریخ انتشار 2005